This report is produced with R

The data set was provided by QUT

Analyze the pick up pattern by day

In this part we identify the day of the week and sum it, then we can find the highest and lowest day in a week.The data include the Uber pick up record since may, 2017 to septemper, 2017 in New york. First, we pick one of the months as example. We add a column of day in the data set, then we use ggplot to display the count by day. It can be seen clearly that Thursay is the highest day and Monday is the lowest day.

library(ggplot2)

uber2<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-may14.csv")

uber<-uber2

uber$day<-c(weekdays(as.Date(uber$Date.Time,format="%m/%d/%y")))

fac<-factor(uber$day, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
fac_table<-table(fac)
uber_t<-data.frame(fac_table)
names(uber_t)=c("day","count")
uber_t
##         day  count
## 1    Monday  51251
## 2   Tuesday  60861
## 3 Wednesday  91185
## 4  Thursday 108631
## 5    Friday  85067
## 6  Saturday  90303
## 7    Sunday  77218
ggplot(uber_t,aes(x=day,y=count,group=1))+geom_point(alpha=0.5)+geom_line()

Anylyze the pick up pattern by month

To find out the trend, we have to sum the data by month. First, we import other months and use “nrow” to count the data in a month, then combining in the same file. Finally, use ggplot to show the result. The result shows that the amount of uber was increasing.

uber3<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-jun14.csv")
uber4<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-jul14.csv")
uber5<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-aug14.csv")
uber6<- read.csv("/Users/shung47/Documents/semester1 2017/Data manipulation/uber data/uber-raw-data-sep14.csv")

May<-nrow(uber2)
Jun<-nrow(uber3)
Jul<-nrow(uber4)
Aug<-nrow(uber5)
Sep<-nrow(uber6)
Total<-rbind(May,Jun,Jul,Aug,Sep)
#mon<-c("May","Jun","Jul","Aug","Sep")
#Total2<-cbind(mon,Total)
dt<-data.frame(Total)
#names(dt)=c("Month","Count")
dt
##       Total
## May  564516
## Jun  663844
## Jul  796121
## Aug  829275
## Sep 1028136
#ggplot(dt,aes(x=Month,y=Count,fill="red"))+geom_bar(stat="identity")+labs(x="Month",y="Count")
ggplot(dt,aes(x=factor(Total),y=Total))+geom_bar(stat="identity")+labs(x="Apr,May,Jun,Jul,Aug,Sep",y="count")

Analyze the pick up pattern by time

First, remove the date, minute, second and use factor to identify the time, then use ggplot to show the bar plot. The results indicates that the lowest pick is 2-3am, then it increase undil 7 am. After a slightly fall, it keep increase until 17-18pm, which is the pick of calling ube.Then it keep droping until the midnight.

uber_tt<-sub("[0-9]/[0-9]?[0-9]/2014 ","",uber$Date.Time)
uber_a<-sub(":[0-9][0-9]:00","",uber_tt)

uber$hour<-uber_a
#uber_time<-aggregate(x=list(amount=uber$hour), FUN=length, by=list(hour=uber$hour))
fac2<-factor(uber$hour,levels=c("0","1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23"))
#uber_time2<-uber_time[order(uber_time$hour),]
fac2_t<-table(fac2)
dt2<-data.frame(fac2_t)
names(dt2)=c("hour","count")
dt2
##    hour count
## 1     0 11910
## 2     1  7769
## 3     2  4935
## 4     3  5040
## 5     4  6095
## 6     5  9476
## 7     6 18498
## 8     7 24924
## 9     8 22843
## 10    9 17939
## 11   10 17865
## 12   11 18774
## 13   12 19425
## 14   13 22603
## 15   14 27190
## 16   15 35324
## 17   16 42003
## 18   17 45475
## 19   18 43003
## 20   19 38923
## 21   20 36244
## 22   21 36964
## 23   22 30645
## 24   23 20649
ggplot(dt2, aes(x=hour,y=count,group=1))+geom_line()+geom_point()+labs(x="time")

Analyze the pick up distribution by day

Import the NY map by“get_map”, then import the data to the map, which is divided into the day of a week by different map. According to the result, Manhattan has a high pick up rate compare to the other area. Howevert, the difference between days are not that obvious as the line graph in the first part.

library(ggmap)
map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)
NYmap

NYmap+stat_density2d(data=uber,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~day)
## Warning: Removed 48058 rows containing non-finite values (stat_density2d).

Analyze the pick up distribution by month

This part is similar to the perivious one. There is no obvious difference on the map between the different months.

map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)

uber2$month<-"May"
uber3$month<-"Jun"
uber4$month<-"Jul"
uber5$month<-"Aug"
uber6$month<-"Sep"
uberA<- rbind(uber2,uber3,uber4,uber5,uber6)

NYmap+stat_density2d(data=uberA,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~month)
## Warning: Removed 434717 rows containing non-finite values (stat_density2d).

Analyze the pick up distribution by time

We can get valuable information from this part. We can see exactly where and when that people call Uber most frequently in a day. Uber drivers can follow the pattern to increase the chance of getting passengers.

map <- get_map(location = 'New york', zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=New+york&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=New%20york&sensor=false
NYmap<-ggmap(map)
NYmap+stat_density2d(data=uber,aes(x=Lon, y=Lat, fill = ..level.., alpha=..level..),geom="polygon",size=2,bins=10)+scale_fill_gradient("Density")+scale_alpha(range = c(.4, .75), guide = FALSE)+guides(fill = guide_colorbar(barwidth = 1.5, barheight = 10))+facet_wrap(~hour)
## Warning: Removed 48058 rows containing non-finite values (stat_density2d).

Conclusion

According to the data, we know the when and where that people called Uber most. Furthermore, the trend of passengers during these months can be observed, which can be used to predict the passenger pattern in the future. Base on the prediction, Uber can allocate drivers to satisfy customers’ demand and maximize the profits.